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ABSTRACT 

This review presents detailed information about the 
structure of triplet repeat RNA and addresses the 
simple sequence repeats of normal and expanded 
lengths in the context of the physiological and 
pathogenic roles played in human cells. First, we 
discuss the occurrence and frequency of various 
trinucleotide repeats in transcripts and classify 
them according to the propensity to form RNA 
structures of different architectures and stabilities. 
We show that repeats capable of forming hairpin 
structures are overrepresented in exons, which 
implies that they may have important functions. 
We further describe long triplet repeat RNA as a 
pathogenic agent by presenting human neurological 
diseases caused by triplet repeat expansions in 
which mutant RNA gains a toxic function. 
Prominent examples of these diseases include 
myotonic dystrophy type 1 and fragile 
X-associated tremor ataxia syndrome, which are 
triggered by mutant CUG and CGG repeats, respect- 
ively. In addition, we discuss RNA-mediated patho- 
genesis in polyglutamine disorders such as 
Huntington's disease and spinocerebellar ataxia 
type 3, in which expanded CAG repeats may act as 
an auxiliary toxic agent. Finally, triplet repeat RNA is 
presented as a therapeutic target. We describe 
various concepts and approaches aimed at the se- 
lective inhibition of mutant transcript activity in 
experimental therapies developed for repeat- 
associated diseases. 



INTRODUCTION 

In the early 1990s, the identification of a new class of 
disease-causing mutations caused considerable excitement 
in the community of human molecular geneticists. The 
mutations were inherited trinucleotide repeat (TNR) ex- 
pansions, and the associated disorders became known as 
Trinucleotide Repeat Expansion Diseases (TREDs) (1). 
Over 20 neurological diseases have now been assigned to 
this group. Each disease is associated with a single defect- 
ive gene, which triggers the process of pathogenesis 
through aberrant expression or toxic properties of 
mutant transcripts or proteins [reviewed in (2^1)]. 
Although researchers have been making efforts to 
develop treatments for TREDs for nearly two decades, 
they remain incurable. 

TREDs include spinal and bulbar muscular atrophy 
(SBMA) (5), fragile X syndrome (FXS) (6), myotonic dys- 
trophy type 1 (DM1) (7), Huntington's disease (HD) (8) 
and a number of spinocerebellar ataxias (SCA) (9,10). The 
first years of research on pathogenic mechanisms in 
TREDs resulted in clear mechanistic separation among 
different groups of the disorders. However, recent 
studies have begun to reveal that mutant RNA and 
mutant protein can act in parallel and exert their toxicities 
independently in some TREDs (11-13). Mutant tran- 
scripts may contribute to the pathogenesis of diseases 
driven by mutant proteins (11,12), and mutant proteins 
may contribute to the pathogenesis of disorders known 
as driven by toxic RNA (13). Thus, the long-standing 
borders between distinct pathomechanisms in TREDs 
are beginning to be crossed, and this crossing occurs in 
both directions. 

Much of the recent excitement brought to the field of 
TREDs may be attributed to the rapid progress of 
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research on various approaches to treat these diseases 
(14-16). All the approaches discussed here are aimed at 
targeting triplet repeat RNA sequences with the goal of 
disrupting their pathogenic interaction with sequestered 
proteins, inhibiting translation from the mutant allele or 
destroying mutant transcripts. In some of these 
approaches, detailed information on the structure of the 
target RNA is essential for the rational design of potent 
reagents that may become useful therapeutic tools in the 
future. 

In this review, we summarize the results of detailed 
structural studies of triplet repeats present in transcripts 
of TRED genes, in either non-coding or protein coding 
regions. Relevant structural information is given to illus- 
trate involvement of RNA structure in the mechanism of 
pathogenesis triggered by expanded repeats. Important 
recent findings are also presented in the context of 
TNR genomics. The genomic and transcriptomic perspec- 
tives are shown to better understand the abundance 
of various triplet repeats, i.e. their presence in the 
cells in which pathology develops and where selective 
targeting by various reagents must occur. The character- 
istics of interactions between TRED transcripts and spe- 
cific proteins are also presented, as these interactions 
determine the downstream adverse effects of TNR 
mutations. 



TRIPLET REPEATS ARE FREQUENT MOTIFS IN 
HUMAN TRANSCRIPTS 

TNRs belong to simple sequence repeats (SSRs), also 
known as short tandem repeats or microsatellites, and 
are common motifs in the genomes of humans and 
many other species (17). The repeats mutate at a very 
high rate, are often polymorphic in length and functions 
proposed for the repeats are related to their variable 
length (18). They are copious not only in genomes but 
also in transcriptomes, and their abundance may be 
higher than originally thought due to the presence of bi- 
directional transcription across the majority of human 
genes and intergenic regions (19,20). Importantly, in 
translated sequences, TNRs are selected preferentially 
over dinucleotide or tetranucleotide repeats, because the 
length variation of TNRs does not change the reading 
frame (21). 

Twenty different TNR motifs may potentially occur in 
RNAs if homotrinucleotide motifs are excluded and dif- 
ferent phases of individual motifs are combined. The great 
abundance of some TNRs in cells raises questions about 
what roles these sequences might play in transcripts (22). 
TNRs differ in length, and the expression levels of their 
host transcripts vary greatly. The structures formed by 
TNRs in transcripts depend not only on the repeat 
motif, and the number of its units, but also on the 
presence of interruptions breaking the homogeneity of 
the repeat tract (23). TNRs that have beneficial structural 
features and functional properties are positively selected 
during evolution, and TNRs with deleterious features are 
selected against (24). 



A number of studies have been performed and many 
resources developed to characterize the frequency and 
location of SSRs (including TNRs) in the human 
genome (17,25-27) and exome (28-33). The main ques- 
tions that have been addressed are the following: how 
many human mRNAs contain TNRs? At what fre- 
quency do certain types of TNRs occur? What is the 
length distribution of various TNRs in transcripts? 
What is the preferred location of TNRs in mRNA? 
And what are the known and putative functions of 
these sequences? 

Three independent studies have provided the answers to 
these questions by identifying TNRs in the human genome 
reference sequence (27,29,34). In the most recent study, 
32448 tracts of uninterrupted TNRs composed of six or 
more repeated units were identified using the BLASTn 
algorithm (29). The relative frequencies of different 
TNR types were similar to those reported in earlier 
genome-wide surveys that used different repeat length 
and purity thresholds (25-27,34). As many as 1030 
TNRs were identified in the exonic sequences of 878 
genes. The TNRs that are strongly overrepresented in 
exons are CNG (where N is any nucleotide), CGA and 
AGG, whereas CTT, CAT, CAA, TAA and TTA are 
robustly underrepresented (Figure 1A). The shortest 
tracts are most prevalent, and the frequency of TNRs de- 
creases roughly exponentially with their length. For the 
majority of TNR types, the longest tracts are <20 
repeated units. However, for some TNRs such as AAG, 
several tracts >30 units have been identified (29). Of the 
1030 exonic TNRs, 59% are located in the ORF, 28% are 
in the 5'-UTR and 13% are in the 3'-UTR. The CCA, 
CAG, CTG, CCT, AGG, AAG and ATG TNRs occur 
most frequently in the open reading frame (ORF) 
(~80%); AT-rich TNRs are more frequent in the 3'- 
UTR, whereas CCG and CGG repeats are most 
frequent in the 5'-UTR (52% and 62%, respectively) (29). 

To better characterize TNR sequences, their occurrence 
and genetic polymorphism have been investigated (35-38). 
Detailed information about triplet repeat length distribu- 
tion in specific genes has been gathered experimentally for 
CAG and CTG repeats (36). A population genotyping 
study was conducted on 100 human genes selected to 
contain the longest runs of these repeats. The results 
demonstrated that very long and highly polymorphic 
repeat tracts are rare in genes not known to be associated 
with TREDs, which is in agreement with the results of a 
previous bioinformatics survey (23). 

Functional association studies have been performed to 
gain some insight into the roles played by TNR-containing 
genes. It was found that genes coding for (i) proteins 
with transcription-related functions, (ii) proteins that 
interact with nucleic acids and (hi) proteins with nuclear 
localization are generally overrepresented among TNR- 
containing genes (27,29,37). These results as well as 
other lines of evidence suggest that the functions of 
TNRs can be expressed not only through proteins, but 
also at the DNA (24,35,39,40) and RNA (29,41^3) 
levels. Furthermore, TNR functions in RNA may 
strongly depend on the structures adopted by these 
sequences. 
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Figure 1. The occurrence of various triplet repeats in the human tran- 
scriptome and their RNA structures. (A) Representation of TNRs 
composed of at least six repeat units in RefSeq mRNA sequences 
compared with the whole human genome sequence (17 out of 20 
triplets are shown due to lack of CGT, CTA, TAG repeats in exons). 
Bar colors correspond to the four classes of RNA structure as shown in 
(B); (B) 20 different triplet repeat RNAs belong to four structural 
classes. On the diagram, they are ordered from left to right according 
to the increasing thermodynamic stability of their secondary structure: 
seven TNRs are single stranded even at low temperature (4°C), five 
AU-rich repeats form semi-stable hairpins in physiological conditions, 
the next six GC-rich TNRs form stable hairpin structures and two 
TNRs form the most stable quadruplexes. 



FOUR STRUCTURAL CLASSES OF TRIPLET 
REPEAT RNAs 

To provide a basis for a functional analysis of triplet 
repeats in RNAs, the solution structures of oligoribonu- 
cleotides (ORNs) composed of the reiterations of specific 
triplets have been investigated under different experimen- 
tal conditions using various methods. The ORNs that 
were first analyzed were four CNG motifs (N = A, C, G 
or U) reiterated 17 times (44). All these ORNs were found 
to form hairpin structures as demonstrated by chemical 
and enzymatic structure probing. The stem of the CNG 
repeat hairpin was shown to be composed of periodically 
occurring C G and G-C base pairs and single N-N base 
mismatches. The hairpin loop was formed by either four 
or seven nucleotides. With the exception of the CGG 
repeat hairpin, the other three repeated CNG motifs 



form 'slippery' hairpins (i.e. tend to form alternative align- 
ments unless they are fixed in one conformation by a G-C 
clamp) (44). Recently, the structures of all 20 different 
triplet motifs repeated either 17 or 20 times were subjected 
to a comparative analysis using biochemical structure 
probing as well as gel mobility analysis, and these struc- 
tures were assigned to four classes (45). As shown in 
Figure IB, AGG and UGG repeats form the most stable 
G-quadruplex structures; CGA, CGU and all CNG 
repeats form hairpins that are more stable than those of 
UAG, AUG, UUA, CUA and CAU repeats, whereas 
CAA, UUG, AAG, CUU, CCU, CAA and UAA 
repeats do not form any higher order structure. Further 
analysis of the hairpin and G-quadruplex structure- 
forming repeats was pursued using biophysical methods. 
UV-monitored structure melting revealed the following 
order of stability for CNG repeat hairpins: CGG > 
CAG > CUG > CCG; the stabilities of the AGG and 
UGG G-quadruplexes are roughly similar. CD spectra 
have shown that both G-quadruplexes are formed by 
parallel RNA strands (45). A shorter version of the 
G-quadruplex forming AGG RNA repeats (GGA) 4 was 
also analyzed by NMR and CD (46,47). The intramolecu- 
lar G-quadruplex was shown to be formed by a G:G:G:G 
tetrad plane and a G(:A):G:G(:A):G hexad plane (47). 
Other studies of triplet repeat ORN structures have 
included UV melting and/or CD studies of all CNG 
repeats or selected sequences of this group (48,49), gel 
mobility analysis of CGG repeats (50,51) and NMR 
studies of CGG repeats (52). 

Crystal structures have been determined for short CUG 
(53,54), CAG (55) and CGG (56) repeats, which form 
intermolecular duplexes. The X-ray structures revealed 
details of the molecular architecture of these duplexes 
that are considered representative for stem portions of 
the CUG, CAG and CGG repeat hairpins. From a struc- 
tural biology perspective, the nature and consequences of 
the periodic U:U, A:A and G:G mismatches were the most 
interesting findings. In the CUG repeat crystal structure, 
the U:U mismatches form stretched wobble interactions 
having only one hydrogen bond between the carbonyl 04 
atom of one uracil residue and the N3 imino group of the 
opposite U residue (53). Similarly, in the CAG repeats, 
only one hydrogen bond is formed between the opposing 
adenine residues. This is an unusual and weak C2-H2»N1 
bond. All the adenine residues are in the a«n'-conform- 
ation and serve as both hydrogen bond donors and ac- 
ceptors (55). In the non-canonical G:G pairs found in 
CGG repeat duplexes, one guanosine residue is always 
in syn and the other is in and conformation, and they 
form two hydrogen bonds, 06»N1H and N7»N2H. The 
helical structures of CGG repeats are more stable than 
those formed by CAG and CUG repeats (56). 

Finally, an interesting correlation was found on 
comparing the occurrence of different triplet repeats in 
exons (described in a previous section) with their struc- 
tures. As presented in Figure 1, TNRs that are strongly 
overrepresented in exons belong to the hairpin forming 
repeats, whereas underrepresented is the majority of 
repeat types that do not form any stable structure. The 
positive selection of repeats capable of forming hairpin 
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structures in transcripts may suggest their importance in 
the regulation of gene expression, but the selection may 
also be acting at the level of amino acid repeats in 
proteins. 



TRIPLET REPEAT EXPANSION DISEASES AND 
MECHANISMS OF PATHOGENESIS 

Over 20 different genes containing unstable TNRs have 
been implicated in the pathogenesis of human neurologic- 
al diseases collectively named TREDs [reviewed in (1)]. 
Expanded CTG, CGG, GAA and CAG repeats are 
sources of degenerative changes leading to symptoms 
associated with DM1, FXTAS, Friedreich's ataxia 
(FRDA) as well as HD and a series of SCAs (Figure 2). 
These are typically late-onset inherited disorders and their 
causative repeat mutations are located in different parts of 
genes that primarily determine the number of potential 
toxic entities. The adverse effect of non-coding mutations 
in DM1 and FXTAS is principally determined by the ex- 
pression of a mutant transcript (57-61), whereas the 
toxicity of coding mutations is pronounced by the 
presence of both RNA and protein, which harbor abnor- 
mally lengthened repeats (11,62). In two other non-coding 
repeat expansion disorders, FRDA and FXS, it is the di- 
minished expression of specific proteins which triggers 
pathogenesis as a result of inhibited or abortive transcrip- 
tion across, respectively, expanded GAA and CGG 
repeats (63-67). 

Among well-studied mechanisms underlying TREDs 
are: (i) toxic RNA gain-of-function caused by transcripts 
harboring expanded CUG, CAG or CGG repeats 
(2,61,68-70); (ii) toxic protein gain-of-function through 
expression of polyglutamine (polyQ) tract encoded by 
mutant CAG repeats (71,72); and (hi) aberrant loss-of- 
transcript and loss-of-protein function caused by GAA 
and CGG expansions (63-67). However, considering the 
results of the most recent reports one can speculate that 



the mechanistic complexity of pathogenesis in TREDs is 
higher and more variable. The presence of non-ATG 
initiated translation was recently reported by Ranum 
and colleagues (13,73), and bidirectional transcription 
through repeat regions was shown for several genes of 
TREDs (74-76). The bidirectional transcription is not 
only a source of sense and antisense transcripts, but also 
leads to the generation of triplet repeat-derived siRNAs 
targeting transcripts containing complementary repeats as 
shown by the groups of Bonini (77) and Richards (78). 
These results indicate the existence of novel toxic entities 
that may give rise to new potential pathomechanisms. 
Further studies will evaluate their importance to the 
pathogenesis of specific TREDs. 



STRUCTURES OF TRIPLET REPEATS IN TRED 
TRANSCRIPTS 

The involvement of mutant transcripts in the pathogenesis 
of DM 1 , and its possible contribution to the pathogenesis 
of FXTAS and polyQ diseases prompted researchers in 
the past decade to take on the detailed structure examin- 
ation of triplet repeat regions in numerous TRED tran- 
scripts. A further argument for undertaking that effort 
was the conviction that the treatments of TREDs aimed 
at the allele-specific inhibition of mutant transcript or its 
destruction by direct repeat targeting will benefit from 
having a deeper insight into the structure of the target. 
The structural information gathered for ORNs (44,45) 
and described earlier in this review was insufficient for 
this purpose, as it did not provide answers to questions 
such as the following: what is the effect of repeat length on 
structures formed by repeats? What is the contribution of 
sequences flanking repeats to the structure of the repeat 
region? And what are the structural roles of various repeat 
interruptions? 

The first study aimed at answering these questions was 
performed on the DMPK transcript implicated in DM1 
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pathogenesis (79). The study design included: (i) a selec- 
tion of representative normal alleles of TRED genes based 
on population genotyping results; (ii) size selection of the 
repeat region based on RNA structure prediction; and (hi) 
PCR synthesis of DNA templates for in vitro transcrip- 
tion, RNA synthesis, end labelling and structure probing 
with the use of nucleases (80) and lead ions (81,82). 
Normal length transcripts containing 5, 11 and 21 CUG 
repeats revealed the conversion of a single-stranded repeat 
region into semi-stable slippery hairpins upon increasing 
the repeat length, whereas stable hairpins were formed by 
expanded 49 CUG repeats (79). The finding that 
double-stranded-like structures are formed by CUG 
repeats in expanded transcripts was instrumental to the 
later discovery that muscleblind-like 1 (MBNL1) protein 
is sequestered by mutant DMPK transcripts in DM1 
patient cells (83). 

Similar in vitro structural analysis was further con- 
ducted for the triplet repeat regions of the majority of 
TRED transcripts and revealed their structural diversity. 
The contribution of sequences flanking the repeats to 
repeat hairpin stabilization was shown for ATXN1 (84), 
CACNA1A (85) and FMR1 (86) transcripts (Figure 3A). 
In contrast, flanking sequences did not influence the struc- 
tures formed by repeats in DMPK (79), ATXN2 (87), 
ATXN3 or ATN1 transcripts (85). In HTT and AR tran- 
scripts, neighbouring repeats CCG and CUG, respect- 
ively, interact with CAG repeat tracts to form hairpins 
that have an unique composite architecture (88). It 
should be recognized, however, that the structures 
determined for triplet repeats in vitro may not fully recap- 
itulate folding that occurs inside cells in the context of full- 
length RNA and various RNA binding proteins. The 
intracellular RNA structures and interactions, which 
were out of reach for a long time, are now amenable for 
investigation also for triplet repeats and on a 
transcriptome-wide scale. Such methods as global 
RACE (89), transcriptome-wide RNA structure probing 
(90) and HITS-CLIP (High-Throughput Sequencing of 
CrossLinking and ImmunoPrecipitation products) (91) 
allow detecting products of RNA cleavages by endogen- 
ous nucleases or exogenous reagents, and determining 
high-resolution maps of RNA-protein interaction in vivo. 

In four TRED-related genes i.e. FMR1, ATXN1, 
ATXN2 and TBP, the majority of the normal alleles 
contain specific interruptions located within the repeat 
tracts. These are AGG triplets disrupting CGG repeats 
in the FMR1 gene, CAT triplets within CAG repeats in 
the ATXN1 gene and CAA interruptions within CAG 
repeats in both the ATXN2 and the TBP genes (87). 
Such repeat interruptions in DNA have been shown to 
function as protective elements, preventing pathogenic 
repeat expansion (92). But what could be their functions 
in transcripts? RNA structure probing revealed that AGG 
interruptions within the CGG repeat of the FMR1 tran- 
script prevent single hairpin structure formation by the 
repeats (Figure 3B). Instead, branched hairpins are 
formed that have the substituted base either in the side 
loop or in an enlarged terminal loop, depending on the 
location of the interruption (86). Similar structural roles 
were demonstrated for CAU and CAA interspersions in 
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Figure 3. Influence of sequences flanking TNRs and repeat interrup- 
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ATXN1, ATXN2 and FMR1 transcripts strongly influence structure 
formation. The interruptions can either break hairpin regularity or 
induce the formation of branched structures in which interruptions 
are predominantly present in terminal and internal hairpin loops. 



the ATXN1 (84) and ATXN2 (87) transcripts (Figure 3B) 
and were predicted for the TBP transcript (87). It was 
hypothesized that the AGG interruptions may protect 
some premutation carriers from being prone to FXTAS 
by shortening the length of the hairpin composed of pure 
CGG repeats (86). In the cases of SCA1 and SCA2, rare 
carriers of expanded interrupted repeats have not de- 
veloped any disease, and the RNA structure of the 
repeat region was found to be better correlated with 
pathogenesis than the length of the polyQ tract (84,87). 
These correlations suggested that RNA hairpin structure 
plays a more general role in the pathogenesis of TREDs. 



TRIPLET REPEAT RNA INTERACTION WITH 
PROTEINS 

The protein binding properties of TNR sequences have 
been mostly studied in relevance to the toxic features of 
mutant transcripts rather than in the context of the 
putative normal functions of TNR RNA (83,93,94). 
These studies took advantage of various methods to 
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identify proteins that bind repeats and to characterize 
these interactions structurally. Most of this research has 
dealt with CUG repeats. First, the CUGBP1 protein was 
identified on the basis of its specific binding to single- 
stranded (CUG)8 incubated in HeLa cell nuclear extract 
(95). CUGBP1 is a member of the CELF (CUGBP and 
ETR-3-like factors) protein family, which regulate a 
number of post-transcriptional RNA processing steps 
including alternative splicing (96,97). The electron micros- 
copy examination of the in vitro binding of recombinant 
CUGBP 1 to transcripts containing expanded CUG 
repeats revealed that the protein localizes only to the 
base portion of the CUG repeat hairpin and its binding 
is not proportional to CUG repeat length (98). Studies 
have shown that CUGBP 1 does not co-localize with 
mutant transcripts in DM1 cells (99,100). 

Swanson and colleagues succeeded in identifying the 
RNA-binding protein, homologous to the Drosophila 
mbl proteins which binds to CUG repeats in a 
length-dependent manner and regulates alternative 
splicing (83). This protein was later shown to co-localize 
with mutant DMPK transcripts in a variety of DM1 
patient cells and model organisms (99,101,102). Using 
the yeast two hybrid system, MBNL1 was shown to 
bind not only to CUG repeats but also to other types of 
repeated sequences, including CAG repeats (93). A 
filter-binding assay revealed a very similar in vitro 
binding affinity of MBNL1 to mutant CUG and CAG 
repeats (103). Moreover, the analysis of fluorescence 
recovery after photobleaching (FRAP) indicated that the 
affinity of the GFP-MBNL1 fusion protein to long CUG 
and CAG repeats is very similar in transfected cells (104). 
Most recently, HeLa and neuroblastoma cells expressing 
5, 30, 70 or 200 exogenous CUG or CAG repeats were 
analyzed for colocalization repeat-containing transcripts 
with endogenous MBNL1 (12). Mutant transcripts with 
70 or 200 CUG or CAG repeats were found to form 
nuclear inclusions that overlap with MBNL1. Other 
studies investigated the ability of CUG and CAG 
repeats to activate RNA-dependent protein kinase 
(PKR), the known cellular sensor of long dsRNA (105). 
CUG repeats of lengths with pathological consequences 
showed some activation of the kinase in vitro, and 
mutant CAG repeats of the HTT transcript were shown 
to activate PKR in human HD brain tissues (106). 

The architecture of a CUG repeat hairpin of mutant 
length (54 repeats) in complex with MBNL1 was investi- 
gated using chemical and enzymatic structure probing and 
electron microscopy (103). MBNL1 multimers were 
shown to form ring-like structures that bind to the stem 
portion of the CUG repeat hairpin. The structures of very 
short oligomers containing CUG motifs in complex with 
MBNL1 were determined by crystallography, and the 
results suggested that MBNL1 may efficiently bind 
single-stranded CUG repeats (107). Very recently, 
MBNL1 was shown to bind (CUG) 17 and (CAG) 17 
ORNs with similar affinity, whereas non-hairpin forming 
repeats of the same lengths composed of AUG or UUA 
repeats did not bind MBNL1 under the same assay con- 
ditions (12). 



Interactions between CGG repeats and proteins have 
been recently investigated in cellular systems. Various 
cell lines were transfected with plasmids expressing 20, 
40, 60 or 100 CGG repeats, and only the expression of 
long repeats supported the formation of nuclear aggre- 
gates in some but not all of the cell lines tested (108). 
These aggregates were shown to recruit the 
RNA-binding proteins SAM68, hnRNP-G and MBNL1. 
Earlier studies identified a number of other proteins that 
co-localize with long CGG repeats. Hagerman and col- 
leagues used fluorescence-activated flow sorting, mass 
spectroscopy and immunohistochemistry to analyze the 
protein composition of RNA-containing intranuclear in- 
clusions formed in astrocytes and neurons of FXTAS 
patients (109). These authors identified >20 proteins, 
including Lamin A/C, vimentin, hnRNP-A2/Bl and 
MBNL1. Furthermore, Pur alpha, hnRNP A2/B1 and 
CUGBP 1 were shown to bind CGG repeats in the 
Drosophila model of FXTAS (94,110). The results of the 
protein binding studies suggest that MBNL1 sequestration 
by expanded CUG and CAG repeat transcripts is likely 
caused by direct RNA-protein interactions (103), and that 
SAM68 sequestration by expanded CGG repeats depends 
rather on indirect interaction (108). 



RNA-MEDIATED PATHOGENESIS TRIGGERED BY 
TRIPLET REPEAT EXPANSION 

The ability of mutant CNG repeats to form long hairpin 
structures (44,79,98) is a significant determinant of toxic 
RNA-dependent pathogenesis in a number of TREDs. 
TNRs of mutant length gain a toxic function by binding 
essential proteins and consequently diminishing their func- 
tional cellular levels (Figure 4). A characteristic feature of 
the RNA-protein interactions in mutant TNR-expressing 
cells is the formation of nuclear foci [reviewed in (111)] in 
which sequestered proteins, such as MBNL1-3 and 
SAM68, become immobilized. These aggregates cause 
cells to develop degenerative changes that are manifested 
through misregulated alternative splicing and embryonic 
splicing patterns in adult tissues (108,112-114). In the fol- 
lowing sections, we characterize the toxic effects of nuclear 
aggregates in cells expressing mutant CUG, CGG and 
CAG repeat RNA focusing on their adverse effect on al- 
ternative splicing. 

Cellular toxicity mediated by expanded CUG repeats in 
DM1 and SCA8 

DM1 was the first neurological disorder in which the 
nuclear retention of transcripts containing expanded 
CUG repeats was detected (57,58). Although the reduced 
expression of both DMPK and the neighboring gene SIX5 
accompany the mutant transcript retention (115), the main 
pathogenic mechanism is a deleterious gain-of-function of 
mutant RNA harboring CUG repeats. Over the years, as 
the number of DM1 model organisms has increased, the 
evidence for an RNA-dominant mechanism has grown 
stronger. The characteristic changes associated with 
DM1 pathogenesis in skeletal and cardiac muscles have 
been reproduced in transgenic mice and flies by expressing 
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Figure 4. Dominant effects of toxic RNA repeats in different TREDs. (A) Mutant CUG and CAG repeat RNAs form nuclear foci (red) in DM1 
and HD human fibroblasts. RNA FISH was performed using fluorescently labelled repeat probes: CAG in DM1 cells and CTG in HD cells, normal 
cells were treated with either probe; (B) In normal cells, transcripts with short CNG repeats are exported from the nucleus and translated into 
functional proteins (first panel). Thus, the availability of nuclear splicing factors CUGBP1 (blue), MBNL1 (green) and SAM68 (orange) is not 
compromised. In DM1 cells (second panel), the CUG mi " transcript is retained in the nucleus, where it sequesters MBNL1 and causes the abnormal 
phosphorylation of CUGBP1, upregulating its nuclear levels. As a result, the incorrect splicing of several MBNL1- and CUGBP1 -dependent 
transcripts occurs (examples of misspliced transcripts are specified as green and blue, respectively). In HD cells (third panel), expanded CAG 
repeat RNA partially sequesters MBNL1, but does not change the level of CUGBP1, which results in splicing abnormalities affecting some 
MBNLl-dependent transcripts. The CAG mut transcript is effectively translated into toxic polyQ protein (polyQ tract is indicated by a red line). 
In FXTAS cells (fourth panel), the CGG mi " transcript colocalizes in the nucleus with SAM68 and MBNL1, but this colocalization does not occur via 
direct RNA-protein interaction. The consequence of this interaction is the aberrant splicing of only SAM68-dependent transcripts (orange). 
Moreover, FMR1 mRNA containing long CGG repeats in its 5'-UTR is poorly translated. 



expanded CUG repeats (97,116,117). These similarities 
have been correlated with an adverse influence of 
mutant transcripts on RNA-binding proteins, i.e. 
MBNL1 and CUGBP1, that causes misregulated alterna- 
tive splicing in DM1 (Figure 4B) (83,101). 

To date, several aberrantly spliced transcripts have been 
identified that explain some of the phenotypic features of 
DM1 pathogenesis. For example, skeletal muscle 
hyperexitability (myotonia) and weakness have been 
associated with the mis-splicing of the chloride channel 
(CLCN1) (113,118-120) and the bridging integrator 1 
(BIN1) (121) transcripts. Additionally, splicing alteration 
of the insulin receptor (INSR) contributes to the insulin 
resistance in DM1 muscle fibers (112,122). Defects in the 
cardiac functions are thought to be associated with the 
aberrant splicing of the troponin T type 2 (cTNT) tran- 
script (123-125). Spliceopathy also features a number of 
central nervous system (CNS) transcripts including those 
for the glutamate receptor, ionotropic TV-methyl D-aspar- 
tate 1 (NAMDAR1/GRIN1), amyloid beta precursor 



protein (APP) and microtubule-associated protein tau 
(MAPT) (68,126). Studies have shown that the 
DM-specific aberrant splicing pattern is reproduced not 
only in mice (102,127) and flies (128,129) expressing the 
DM1 mutation, but also in Mbnll AE3/AE3 knockout mice 
(102) and in CUGBP1 overexpressing mice (127,130,131). 
These findings provide further evidence for the prominent 
roles played by these two splicing factors in DM1 
pathogenesis. 

An RNA gain-of-function mechanism by mutant CUG 
repeats is also implicated in SCA8 pathogenesis. This 
neurodegenerative disease primarily affects the cerebellum 
and is caused by a CTG»CAG repeat expansion that is 
transcribed in both directions and gives rise to the anti- 
sense non-coding CUG-harboring transcripts and 
translated CAG-bearing transcripts (Figure 2). The later 
transcripts undergo conventional translation that results 
in polyQ protein and non-ATG translation (13) that 
occurs across expanded CAG repeats in all reading 
frames and gives rise to the homopolymeric proteins of 
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long polyglutamine, polyserine and polyalanine tracts 
[reviewed in (73,132)]. Whereas protein products of the 
sense transcripts build up as intranuclear inclusions that 
are detected in human brain autopsy tissue and in the 
brains of transgenic SCA8 mice (69), the nuclear retention 
of mutant CUG transcripts is manifested by the formation 
of RNA inclusions that colocalize with MBNL1 in 
selected neurons in the cerebellum. This event is thought 
to affect the alternative splicing of a number of CNS tran- 
scripts including APP, MAPT, NMDAR1 and MBNL1, 
which mimics aberrations observed in DM1 (68). In 
addition, the SCA8 CUG repeat expansion transcripts 
trigger splicing changes and increased expression of the 
GABA-A transporter 4 (GAT4/Gabt4) due to the 
dysregulation of MBNL/CELF regulated pathways in 
the brain (69). 

Toxicity mediated by expanded CGG and CAG repeat 
RNAs 

Strong evidence has been provided for the contribution of 
an RNA gain-of-function mechanism also in the patho- 
genesis of FXTAS and some of the polyglutamine dis- 
orders (11,109). In these diseases, specific proteins are 
sequestered into nuclear foci containing expanded CGG 
and CAG repeats that cause the deviated splicing of 
specific transcripts, which resembles DM1 pathogenesis 
(Figure 4B). In FXTAS, the mis-splicing might affect 
only transcripts regulated by proteins recruited early to 
CGG repeat foci as shown by Charlet-Berguerand and 
colleagues (108). In fact, in mutant CGG-expressing 
cells, the only foci protein whose functional activity is 
compromised is splicing factor SAM68, which is recruited 
to RNA foci before hnRNP G and MBNL1. 
Consequently, the misregulation of pre-mRNA alternative 
splicing controlled by SAM68 is observed in CGG-trans- 
fected cells and in FXTAS patients. An analysis of alter- 
native splicing of the phospho-type-4 ATPase-llB 
(ATP 1 IB) pre-mRNA showed a splicing misregulation 
in SAM68-depleted cells as measured by a significant 
decrease of exon-28B inclusion in comparison with 
control samples (108). 

Experimental evidence has shown that expansions of 
CAG repeats in coding exons may convey pathogenic po- 
tential not only to polyQ proteins, but also to transcripts 
that harbor the repeat mutation. Initially, it was 
demonstrated by Cooper and colleagues that in COSM6 
cells the transient expression of 960 CAG repeats causes 
nuclear retention of the expanded repeats (104). 
Subsequent experimental evidence provided support for 
the toxic capacity of mutant CAG repeat RNA in trans- 
genic mice (11), worms (133) and the SCA3 Drosophila 
model (62) as well as in human SCA3 and HD fibroblasts 
(12). However, because mutant CAG repeat-triggered al- 
ternative splicing abnormalities were discovered very 
recently, the scale of these effects and their roles in patho- 
genesis has yet to be determined (Figure 4B). The most 
recent results from our laboratory show that in human 
HD and SCA3 fibroblasts, the expression of endogenous 
HTT and ATXN3 mutant transcripts causes the forma- 
tion of CAG repeat foci and MBNL1 sequestration 



(12,88), which in turn trigger the aberrant splicing of en- 
dogenous sarco/endoplasmic reticulum Ca2+ ATPase 1 
(SERCA1) and INSR transcripts. Similarly, expanded 
but exogenous CAG repeats mimic CUG repeats in the 
misregulation of alternative splicing, and in HeLa and 
neuroblastoma cells both types of repeat RNA cause 
defects in the processing of SERCA1, INSR, LIM 
domain binding 3 (LDB3) and CLCN1 pre-mRNAs 
(12). These results show that transcripts containing 
expanded CAG repeats may contribute to the pathogen- 
esis of HD, SCA3 and perhaps other polyglutamine 
diseases. 



EXPERIMENTAL THERAPIES DIRECTED AGAINST 
EXPANDED REPEAT RNA 

To date, three main strategies which use expanded repeat 
RNA as a target have been tested as experimental 
therapies for TREDs. The first is based on degrading 
mutant transcripts with RNA interference (RNAi) tools 
or antisense oligonucleotides (AON); the second utilizes 
repeat hairpin-specific small compounds or antisense 
oligomers to inhibit pathogenic interactions between 
repeat RNAs and nuclear proteins; and the third is 
aimed at blocking toxic protein synthesis by binding 
chemically modified antisense oligomers to repeats or by 
targeting repeats with mutant siRNAs acting as miRNAs 
(Figure 5). These strategies are aimed at destroying the 
toxic agents of pathogenic pathways associated with 
TREDs which involve either RNA, protein or both. 

An important issue in therapies for TREDs that involve 
repeat targeting reagents is the requirement for gene and 
allele selectivity. Despite the fact, that there are numerous 
mRNAs containing triplet repeats in human transcrip- 
tome, the specific inhibition of mutant gene expression 
by targeting repeat regions is a promising therapeutic 
strategy for diseases caused by CAG and CUG repeat 
expansion. Among the features used in designing selective 
reagents are differences between normal and mutant tran- 
scripts related to their repeat sequence lengths and struc- 
tural properties as well as, in the case of DM, cellular 
localization. The intended targets i.e. expanded repeats 
are likely to form hairpin structures in cells which make 
them both more prone (increased repeat length) and more 
resistant (more stable structure) to interaction with target- 
ing reagents in comparison with normal repeats. The ne- 
cessity of selective inhibition of mutant allele expression 
may not be equally important for all polyQ diseases; 
nevertheless, preserving the expression of the protein 
from normal allele seems to be an advantageous feature 
of any therapeutic strategy. 

Below, we describe the use of AON and RNAi reagents 
as well as that of small compounds that specifically bind to 
some repeat hairpins. According to their mechanism of 
action, antisense reagents can be divided into two cate- 
gories: 'cutters', which bind to complementary targets 
and induce their cleavage taking advantage of RNaseH 
or Argonaute 2 (AG02) activities (AON or siRNA, re- 
spectively), and 'blockers', which are oligomers that bind 
to complementary targets, either alone or within protein 
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Figure 5. Therapeutic strategies aimed at targeting expanded triplet repeats in TRED transcripts. The left panel shows the main pathogenic agent in 
diseases caused by expanded CUG or CAG repeats, which is either toxic RNA structure or toxic polyQ protein (both marked red). The middle and 
right panels show the application of therapeutic AON, RNAi reagents and small compounds (indicated by a blue line) that specifically bind to CUG 
or CAG repeat hairpins. Antisense reagents (AON) or siRNA called 'cutters' induce RNA degradation via RNaseH or AG02 (middle panel), 
whereas 'blockers' (PNA, LNA, morpholino and miRNA-like duplexes) bind to repeat RNA alone or within protein complexes and inhibit either 
pathogenic RNA/protein interaction or translation of toxic polyQ proteins (right panel). See text for more details. 



complexes, but do not induce the cleavage of their com- 
plementary target. Examples of 'blockers' include peptide 
nucleic acids (PNAs), locked nucleic acids (LNAs), 
morpholino and miRNA-like acting duplexes (Figure 5). 
The exact mechanisms by which different reagents 
described below exert their activity are not yet known in 
detail. 

The degradation of mutant transcripts induced by 
antisense oligonucleotides and RNAi triggers 

Therapies for polyQ expansion diseases are aimed primar- 
ily at reducing the level of the mutant protein. In addition, 
mutant transcript downregulation may be desirable 
because of its involvement in pathogenesis. RNAi 
reagents require ~20nt of complementary sequence for 
efficient silencing, which constitutes only 7 CAG repeats. 
While the normal alleles of CAG-bearing transcripts 
usually have 10-20 repeats, their mutant versions 
contain 40-100 CAG repeats meaning that transcripts 



from both alleles can be targeted by triplet repeat 
siRNA duplexes. Moreover, only among annotated 
human mRNAs, there are at least 50 transcripts contain- 
ing CAG repeats and 30 with CUG repeats (29) being 
potential targets for complementary AONs and RNAi 
reagents. 

RNA duplexes composed of CAG and CUG repeat 
strands have been tested in cell culture for their ability 
to silence HTT, AR, ATXN1 and ATXN3 transcripts. 
The reagents used included 81 -bp long synthetic CAG/ 
CUG RNA duplexes (134), 21 -bp siRNA duplexes (135- 
137) and shRNA (138), and all showed only a slight 
silencing preference for mutant allele versus normal 
allele. Both strands of CAG/CUG siRNA were found to 
be active and silenced other transcripts also containing 
complementary repeats causing a considerable loss of 
the viability of human fibroblasts (137). The CAG/CUG 
siRNA induced some toxicity also in two Drosophila 
models co-expressing elevated levels of expanded CTG 
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and CAG repeats (77,78). Interestingly, the cell culture 
experiments showed that hairpin structures which are 
likely formed in cells by expanded CAG repeats did not 
inhibit the activity of RNAi machinery directed at the 
repeat region (88,137). As a result, mutant HTT tran- 
scripts were efficiently targeted by CAG/CUG repeat 
siRNAs and the repeated CAG sequence was cleaved at 
numerous positions by AG02 loaded with CUG repeat 
guide strand (88). 

In DM1, the degradation of expanded CUG repeat 
transcripts was taken into consideration as possible 
therapy (139). To destroy toxic RNA in cells, several 
types of antisense reagents directed against CUG repeats 
have been successfully tested. Retroviral expression of 
149-nt RNA complementary to part of the DMPK 3'- 
UTR in DM1 myoblasts resulted in the 80% silencing of 
the mutant DMPK transcript and a 50% reduction of the 
normal transcript leading to the partial restoration of 
some myoblast normal functions (140). Recently, 
Furling and colleagues (141) engineered AONs by the 
covalent linking of RNA sequences composed of 7, 11 
or 15 CAG repeats to human U7 small nuclear RNA 
for their delivery exclusively to the cell nucleus. The 
lentiviral transduction of DM1 muscle cells with such con- 
structs caused the specific silencing of ~60-70% of mutant 
DMPK mRNA without affecting normal DMPK tran- 
scripts. In the treated cells, the number of nuclei contain- 
ing toxic mutant CUG repeat foci was reduced and the 
alternative splicing of several DM1 -affected genes was sig- 
nificantly corrected (141). 

In another study, Wansink and colleagues (142) 
designed a short single-stranded (CAG)7 2'-0-methyl 
phosphorothioate oligonucleotide (PS58) to degrade tran- 
scripts containing expanded CUG repeats. Such chemical 
modification of antisense oligomers is known to stabilize 
duplex formed with a target sequence and increase the re- 
sistance of AONs to cellular nucleases. PS58 decreased the 
level of mutant DMPK transcripts by 50-90% in patient 
myoblasts and in two DM1 mouse models, DM500 and 
HSA hK . Interestingly, the levels of other transcripts con- 
taining short CUG repeats remained almost unchanged. 
Local administration of this AON into the mouse skeletal 
muscles reduced CUG repeat foci formation, partially 
restored the alternative splicing of DM-specific exons 
and significantly reduced myotonia (142). 

Antisense 'blockers' targeting triplet repeat RNA 

Another straightforward therapeutic strategy is targeting 
expanded CAG repeats in mutant transcripts to decrease 
the production of toxic polyQ protein. Corey and col- 
leagues have shown the allele-selective inhibition of toxic 
protein translation in HD and SCA3 cells, using antisense 
PNA and LNA oligomers composed of 7 CTG repeats 
(136). After strong binding to complementary sequences, 
these oligomers formed an impassable translational 
barrier only in mutant transcripts (Figure 5). REP19N, 
containing 19 PNA residues and modified by addition of 
lysine residues at both termini, was the best allele- 
discriminating reagent, with a specificity of inhibition at 
least 10 times greater for mutant HTT protein translation 



than for normal protein. Moreover, the translation of 
other transcripts containing CAG repeats was not in- 
hibited significantly in PNA- and LNA-treated cells. 
PNA oligomers were also tested for mutant ATXN3 trans- 
lation inhibition; however, the maximum selectivity 
achieved (<5) was lower than that observed for HTT 
(136). Oligomers with other chemical modifications such 
as 2', 4'-constrained ethyl (cEt), carba-LNA, 2'-0- 
methoxyethyl (2'-MOE) or 2'-fluoro have also been 
shown to be promising blockers of mutant huntingtin 
translation (143). 

Another approach to RNA repeat-targeted inhibition of 
mutant protein synthesis is the application of CAG/CUG 
siRNA duplexes containing base substitutions at specific 
positions causing mismatches with their mRNA targets. 
This approach was tested in cultured HD fibroblasts 
(137,144) where several duplexes caused the efficient trans- 
lational inhibition of mutant huntingtin without 
downregulation of its transcript. CAG/CUG duplexes 
that form one or more mismatch with target sequences 
most likely act as miRNAs despite the fact that their 
target is located in the ORF. In most efficient and selective 
duplexes, the mismatched bases were present in the central 
positions of the duplex (144) or in its 3' half (137). 
Interestingly, transcriptional activation of the normal 
HTT allele triggered by some of these duplexes occurred 
concomitantly with the silencing of mutant allele (137). 
This approach was also successfully used to target the 
CAG repeat region of the mutant ATXN3 transcript, 
although the selectivity of the reagents was lower than 
that observed for HTT inhibition (145). Efforts are 
being continued to develop this strategy further and to 
obtain even more selective reagents. 

The primary pathogenic effect of expanded CUG 
repeats in DM1 is the sequestration of nuclear proteins, 
including MBNL1. Therefore, preventing pathogenic 
RNA-protein interactions is another approach to 
reducing the toxic effects of mutant RNA. Proof of the 
principle for this concept came from two independent 
studies (146,147). Swanson and colleagues showed that 
the overexpression of MBNL1 in an expanded CUG 
repeat expressing HSA hR mice can reverse the DM-like 
phenotype (146). Thornton and colleagues using an anti- 
sense morpholino oligomer composed of 25 CAG repeats 
(CAG-25) inhibited the MBNL1/RNA interaction and dis- 
rupted the complexes preformed in vitro (147) (Figure 5). 
The morpholino modification provides resistance to 
cellular nucleases, high-stability duplexes with comple- 
mentary targets and does not induce the cleavage of 
targeted RNA. These features made it a promising agent 
for in vivo therapy. The local administration of CAG-25 
into the skeletal muscles of DM1 HSA^ K mice expressing 
(CUG)250 RNA resulted in some molecular and pheno- 
typic changes manifested through the significant reduction 
or elimination of nuclear CUG repeat foci and nearly a 
complete reversion of abnormal splicing. These changes 
were driven by the release of MBNL1 protein from seques- 
tration and the restoration of ion channel function leading 
to a significant reduction of myotonia. Moreover, in 
treated muscles, the mutant transcript was efficiently ex- 
ported from the nucleus and translated in the cytoplasm. 
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Importantly, the researchers found that the in vivo bene- 
ficial molecular effects were observed for as long as 14 
weeks after a single injection and the expression of other 
transcripts containing short CUG repeats was not affected 
in skeletal muscle treated with CAG-25 (147). 

Among the advantages of using repeat-targeting 
reagents, both 'cutters' and 'blockers', the most important 
is their potential application in several expansion-driven 
diseases. In contrast, widely exploited gene-specific and 
SNP-based strategies are applicable only for a fraction 
of patients suffering from specific disorders. On the 
other hand, the main challenge for repeat-targeting 
strategies is directing the reagent specifically to the 
mutant allele and leaving other repeat-containing tran- 
scripts intact. At this time, all potential mRNA off-targets 
are known (29), as discussed in 'Triplet Repeats are 
Frequent Motifs in Human Transcripts' section of this 
review, and their representatives may be tested along 
with testing allele selectivity of silencing disease-causing 
genes. 

Small compounds that bind specifically to repeat 
RNA hairpins 

Another strategy considered for the treatment of some 
TREDs is based on the identification of small compounds 
that interfere with pathogenic interactions between 
expanded RNA repeats and proteins. Different groups 
have used various approaches to identify ligands that spe- 
cifically bind CUG and CAG repeat hairpins (148-153). 
Screening a library of ~11 000 compounds yielded a few 
molecules that showed selectivity for binding to either 
short or expanded CUG repeat hairpins (148). These 
ligands were able to prevent the CUG repeat/MBNLl 
interaction in vitro with a low micromolar inhibition 
constant. In another work, pentamidine and neomycin B 
were shown to inhibit the interaction of short CUG repeat 
RNA with MBNL1 in vitro (152). However, only the 
former drug reversed the mis-splicing of two 
DM1 -affected transcripts and released MBNL1 from 
nuclear inclusions in cells expressing 960 CUG repeats. 
Additionally, high doses of pentamidine administered by 
intraperitoneal injection into HSA^ mice partially 
reversed the mis-splicing of Sercal and Clcnl 
pre-mRNAs (152). In the most recent work, D-amino 
acid hexapeptide (ABP1) was selected from a combinator- 
ial peptide library screen in DM1 Drosophila model (153). 
In vitro, ABP1 induced a switch of CUG hairpins to a 
single-stranded conformation, whereas in Drosophila it 
reduced CUG foci formation and suppressed 
CUG-induced lethality. An intramuscular injection of 
ABP1 into DM1 HSA R mice reversed muscle histopath- 
ology and mis-splicing of Sercal and Tnnt3. 

The important issue regarding such compounds is their 
binding specificity for RNA structures formed by repeated 
sequences. To address this issue, Disney and colleagues 
designed several multivalent ligands containing between 
two and five Hoechst 33258 or kanamycin A derivatives 
attached to a peptoid backbone (Figure 5) (150,151,154). 
The modularly assembled ligands differed in their distance 
between RNA-binding modules. All ligands were tested 



both for their binding to CAG and CUG repeat tran- 
scripts and for the inhibition of repeat RNA/MBNL1 
interactions. The best multivalent ligands inhibited the 
formation of RNA-protein complexes with inhibition 
constant falling within the low nanomolar range. These 
promising compounds have not yet been tested in 
cellular or animal systems expressing expanded repeats; 
however, their cell permeability has been demonstrated 
(151). 



FINAL REMARKS AND FUTURE PERSPECTIVES 

In this article, we reviewed several aspects of research on 
triplet repeat RNA. The occurrence of TNRs in human 
transcripts was presented and the structural features of 
both normal and pathogenic repeats were described. The 
role of mutant TNRs as triggers of disease pathogenesis 
and targets in experimental therapies for TREDs was dis- 
cussed wherever relevant, from the perspective of RNA 
structure. 

At present, all annotated human mRNAs containing 
triplet repeat tracts have been identified. However, only 
a fraction of these transcripts is relatively well 
characterized in terms of their TNR length polymorphism 
and tissue expression. There is a much larger gap in our 
knowledge of TNR RNAs if we take into consideration 
various antisense transcripts from human genes as well as 
sense and antisense transcripts from intergenic regions. 
Nevertheless, even this gap may be filled soon as relevant 
data may be mined by bioinformatics from several 
recently completed and on-going large-scale sequencing 
and expression profiling projects. We foresee that clearer 
and a nearly complete view of human TNR RNA 
repeatome is just on the horizon and this information 
may become available soon for more focused research. 

The efforts of several laboratories over the past decade 
resulted in highly advanced structural characteristics of 
TNR RNA, which was achieved using biochemical and 
biophysical methods. TNRs may now be considered as a 
part of human RN-ome that belongs to the best- 
characterized structurally. We know which TNR types 
form G-quadruplex structures, which tend to form more 
stable and less stable hairpins and which are reluctant to 
form any higher order structures. We also know that re- 
peats having hairpin structure forming potential are over- 
represented in exons and therefore are likely implicated in 
some specific biological functions. At present, however, 
the normal functions of TNRs in transcripts are very 
poorly understood. 

The specific functions of TNRs in RNA need to be ex- 
pressed via interactions with proteins. But only a few 
proteins that interact directly with normal length TNR 
RNA have been identified thus far and their interactions 
characterized. As described in this review, the length of 
TNR RNA influences its protein binding properties and 
only mutant TNRs are efficient in specific protein seques- 
tration. This raises a more general question about the 
protein binding potential of normal TNR sequences. 
One possible answer is that this potential is low and 
only few proteins are capable of getting involved in such 
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interaction. The alternative answer is that this potential is 
higher, but no systematic study was conducted thus far to 
disclose it. In our opinion, further research focused on 
finding additional proteins that act on relatively short 
normal repeats and on expanded pathogenic repeats, 
may help in a better understanding of the normal roles 
of these sequences, and possibly in identifying new 
pathways of pathogenesis in TREDs. High-throughput 
analyses of triplet repeat RNAs associating with proteins 
are also encouraged to identify their transcriptome-wide 
interaction maps in cells. 

Looking forward to any future structural studies of 
TNR RNA, we predict that recent successes at resolving 
crystal structures of short TNR duplexes may stimulate 
efforts to determine high-resolution structures of longer 
repeat TNRs and their complexes with various biological- 
ly and therapeutically relevant ligands. Further progress in 
this direction may help to better understand the 
RNA-triggered pathogenic mechanisms in TREDs, and 
promote a rational design of repeat-targeting therapeutic 
agents. 

Recent years have witnessed rapid progress in develop- 
ing experimental therapies for TREDs in various cellular 
and animal model systems. Several different concepts and 
approaches have been successfully tested, making clinical 
trials on humans a realistic prospect. Repeat-targeting 
antisense reagents and RNA interference reagents acting 
in miRNA fashion seem to be the most promising treat- 
ment options. Efforts are now under way to better char- 
acterize such reagents and optimize their gene selectivity 
and allele selectivity, and minimize sequence-specific and 
non-specific off-target effects. 
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